Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

نویسندگان

چکیده

Soft spelling mistakes are a class of that is widespread among native Arabic speakers and foreign learners alike. Some these typographical in nature. They occur due to orthographic variations some letters the complex rules dictate their correct usage. Many people forgo rules, given identical phonetic sounds, they often confuse such letters. In this paper, we investigate how use machine learning there no su?icient datasets train correction models. errors detection an active field natural language processing. We generate training using proposed transformed input approach stochastic error injec-tion approach. These approaches applied two acclaimed represent Classical Modern Standard Arabic. treat problem as character-level, one-to-one sequence transcription problem. This include omissions deletions possible with adopted simple transformations. permits bidirectional long short-term memory (BiLSTM) models more effective compared other alternatives encoder-decoder Based on investigating multiple alternatives, recommend configuration has BiLSTM layers, trained injection rate 40%. The best model corrects 96.4%of injected achieves low character 1.28% real test set soft mistakes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Spelling Correction using Supervised Learning

In this work, we address the problem of spelling correction in the Arabic language utilizing the new corpus provided by QALB (Qatar Arabic Language Bank) project which is an annotated corpus of sentences with errors and their corrections. The corpus contains edit, add before, split, merge, add after, move and other error types. We are concerned with the first four error types as they contribute...

متن کامل

Arabic Text Categorization using Machine Learning Approaches

Arabic Text categorization is considered one of the severe problems in classification using machine learning algorithms. Achieving high accuracy in Arabic text categorization depends on the preprocessing techniques used to prepare the data set. Thus, in this paper, an investigation of the impact of the preprocessing methods concerning the performance of three machine learning algorithms, namely...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Motor Control: Correcting Errors and Learning from Mistakes

How do we learn from errors during complex movement tasks with redundancy? A new study shows that ambiguous mistakes in bimanual movements are corrected by the non-dominant hand, and responsibility for the error is assumed to fall to the effector with a recent history of poor performance.

متن کامل

Georeferencing Semi-Structured Place-Based Web Resources Using Machine Learning

In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Advanced Computer Science and Applications

سال: 2022

ISSN: ['2158-107X', '2156-5570']

DOI: https://doi.org/10.14569/ijacsa.2022.0130594